-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvc: implement multi-stage dvcfile #3584
Conversation
70a21f6
to
5866e4c
Compare
6031b1d
to
a75c351
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing that you've managed to preserve backward compatibility π₯ I gave it a try and it works pretty nicely! I also like that the commands stay with the old behavior for now, so even though we clearly have lots to clean up here, I would be fine merging this if you fix the biggest issues (biggest in your own opinion as well) for the sake of speed and would just move on with more fine cleaning after (there is clearly a lot of stuff we could make better here).
Now, pipeline stage files can house multiple stages, and separate lockfiles are created which has the checksums, whereas Dvcfile will be clean and human readable and editable. The *.dvc files will be generated for output files. It is available via a hidden flag: -n. Fixes iterative#1871 PR: iterative#3584
a875e70
to
f309b79
Compare
@efiop, this should be good to merge. Please take a look (I have fixed all except 1). |
@skshetry Thanks! π Let's run with it. Merging for now, will play around with the codebase some more while adding build-cache support. |
Looking forward to having a go at this! I've been hacking together a script that parses a |
I just realized multi-stage doesn't seem to support |
Hi, @tall-josh. We are still iterating on this issue, and
Looks like you dived into this, maybe you have some inputs? P.S. And, yes, |
Thanks @skshetry. My hope was to use the multi-stage files as a way to declaratively put together a pipeline. I'm not sure if this was the intention, but that goal in mind:
We have another discussion on #3633 about referencing variables inside dvc (single and multi-stage) files that may be helpful too. These two features would go a long way to my declarative dvc dream. |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.
β I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)
Thank you for the contribution - we'll try to review it as soon as possible. π
This PR adds
-n
flag (hidden at the moment), and will create a multi-stage dvcfile.Eg:
$ dvc run -n stage1 -d foo -o bar "cat foo foo > bar"
For addressing the given stage:
$ dvc repro Dvcfile:stage1 $ dvc pipeline show --ascii :stage1 # assumes Dvcfile automatically
Multistage file format:
Lockfile format
Closes #3606